CatBoost: gradient boosting with categorical features support

نویسندگان

  • Anna Veronika Dorogush
  • Vasily Ershov
  • Andrey Gulin
چکیده

In this paper we present CatBoost, a new open-sourced gradient boosting library that successfully handles categorical features and outperforms existing publicly available implementations of gradient boosting in terms of quality on a set of popular publicly available datasets. The library has a GPU implementation of learning algorithm and a CPU implementation of scoring algorithm, which are significantly faster than other gradient boosting libraries on ensembles of similar sizes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Amazon Employee Access Control System

In this work, based on the history data of 20102011 from Amazon Inc., we build up a system which aims to take place of resource administrators at Amazon. Our analysis shows that the given dataset is highly imbalanced with categorical values. Thus in the preprocessing step, we tried different sampling methods, feature selection as well as one hot encoding to make the data more suitable for predi...

متن کامل

Machine Learning Models for Housing Prices Forecasting using Registration Data

This article has been compiled to identify the best model of housing price forecasting using machine learning methods with maximum accuracy and minimum error. Five important machine learning algorithms are used to predict housing prices, including Nearest Neighbor Regression Algorithm (KNNR), Support Vector Regression Algorithm (SVR), Random Forest Regression Algorithm (RFR), Extreme Gradient B...

متن کامل

Predicting customer behaviour: The University of Melbourne's KDD Cup report

We discuss the challenges of the 2009 KDD Cup along with our ideas and methodologies for modelling the problem. The main stages included aggressive nonparametric feature selection, careful treatment of categorical variables and tuning a gradient boosting machine under Bernoulli loss with trees.

متن کامل

Modeling MOOC Dropouts

In this project, we model MOOC dropouts using user activity data. We have several rounds of feature engineering and generate features like activity counts, percentage of visited course objects, and session counts to model this problem. We apply logistic regression, support vector machine, gradient boosting decision trees, AdaBoost, and random forest to this classification problem. Our best mode...

متن کامل

CSE 255 Assignment 2 : Upvotes Prediction for Reddit Submissions

In this paper we consider models for predicting the number of upvotes on a reddit submission. We examine features such as the number of votes, number of comments, time of submission, upvote history of users, images, and subreddits of the submission. We compare Support Vector Regression, Linear Regression, and Gradient Boosting Regression models for predicting the number of upvotes.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017